Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46990: Add configuration for embargo SIA #26

Merged
merged 7 commits into from
Dec 18, 2024
Merged

Conversation

dhirving
Copy link
Contributor

Add a configuration to use for the SIA service pointing to the embargo repository at USDF.

Add an initial template for an SIA configuration to use with the embargo repo at USDF.
Copy link

codecov bot commented Dec 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.29%. Comparing base (e923387) to head (0e83c56).
Report is 8 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #26   +/-   ##
=======================================
  Coverage   82.29%   82.29%           
=======================================
  Files          18       18           
  Lines        1079     1079           
  Branches      174      174           
=======================================
  Hits          888      888           
  Misses        166      166           
  Partials       25       25           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -0,0 +1,71 @@
facility_name: Rubin-LSST
obs_collection: LSST.Embargo
collections: [<!!!! JIM PLEASE FILL IN LIST OF COLLECTIONS CONTAINING IMAGES OF INTEREST !!!!>]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TallJimbo I think we just need two things from you:

  1. A list of image dataset types from the embargo repo that will be of interest to Rubin staff
  2. A list of collections containing those images

The interface they will be accessed through is similar to this one:
image
image

This older config for LATISS images may be useful inspiration: https://github.com/lsst-dm/dax_obscore/blob/main/configs/usdf-embargo-live.yaml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If you just give me the dataset type names I can work with Gregory to figure out the rest of the config associated with them.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataset type names: raw, postISRCCD, calexp, pvi, deepCoadd, deepCoadd_calexp, goodSeeingCoadd, goodSeeingDiff_differenceExp

Note that this is definitely a list intended for a staff RSP; science users will not see all of these dataset types, as some of them are only useful when something has gone wrong and you want to see where it happened. It also does not include calibration frames; if there's a desire to include those, I'll have to get someone else to give you a list.

Collections:

  • Source of truth is https://rubinobs.atlassian.net/wiki/spaces/DM/pages/48834013/Campaigns
  • I think we're interested in the prompt processing, nightly-validation, and intermittent DRP campaigns here.
  • This resolves to the following collection names: LSSTComCam/prompt/output-<day_obs>, LSSTComCam/nightlyValidation, LSSTComCam/runs/DRP/<data-date-range>/w_2024_XX/DM-XXXXX.
  • We might want to publish the set of daily LSSTComCam/runs/nightlyValidation/{day_obs}/<lsst_distrib_tag>/DM-XXXXX collections instead of the umbrella LSSTComCam/nightlyValidation collection to keep butler queries simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which aspect of the queries are you referring to when you say "keep butler queries simpler?"

Copy link
Member

@TallJimbo TallJimbo Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The umbrella collection is a CHAINED collection with one member for each day, and the same dataset types in each member (roughly). So the collection summaries will help ot much in collapsing queries down to fewer actual RUN collections.

configs/embargo-sia.yaml Outdated Show resolved Hide resolved
It looks like the SIA code is not currently set up to handle these -- it's not querying the globs at all, and even if it was it is doing a find-first search which is not compatible with glob searches.
@dhirving
Copy link
Contributor Author

There's a bit of an impedance mismatch between the way the dax_obscore SIA code wants to handle collections and the collection structure of LSSTComCam/runs/DRP/... and LSSTComCam/prompt/...

The SIA query is using a find_first search, which isn't going to work with re-runs having the same data IDs in multiple collections in LSSTComCam/runs/DRP -- since it will be arbitrary which collection is first. My intuition would be that it should not be using find_first, so that we just find all images in all collections. But there wouldn't be any way to distinguish images from different collections from each other in the results, currently.

Perhaps better would be if the Butler collections were exposed as ObsCore collections, so that the caller could specify which collections they wanted to search in. But the code isn't set up to do that right now -- the ObsCore collection name is just a static string defined in the config file.

We might consider creating some chained collections in the embargo repo to serve as explicit top-level collections for SIA searches, which would make things easier to administer.

In any case the LSSTComCam/nightlyValidation chained collection seems well behaved with the current code and at least has raw, postISRCCD, and calexp in it. So that should be sufficient to unblock Stelios and Trey for the moment.

Tweaking the SIA code itself is a bigger scope than I want to jump in and help with right now, so getting access to the other collections might need to wait until Tim is back.

@dhirving dhirving marked this pull request as ready for review December 12, 2024 22:59
@TallJimbo
Copy link
Member

Oh, ok, if the collections here are all queried together in the end anyway, then I'd recommend using the most recent DRP collection as the only one until we can rework things as you've described, as that runs a superset of the tasks in nightly-validation with consistent code and configuration, and now that we're not on-sky anymore there's no real value to being able to automatically pick up new processing from prompt or nightly-validation.

@dhirving
Copy link
Contributor Author

OK, that should get us a little further. The existing SIA code has the ability to override the collection config on the fly, so Stelios can add some code to the SIA service to locate and use the latest DRP collection.

@dhirving
Copy link
Contributor Author

Or I guess we could ask the campaign team to maintain a chained collection with the latest DRP as the only child.

@dhirving
Copy link
Contributor Author

cc/ @stvoutsin

@stvoutsin
Copy link
Contributor

OK, that should get us a little further. The existing SIA code has the ability to override the collection config on the fly, so Stelios can add some code to the SIA service to locate and use the latest DRP collection.

Just to make sure I understand what this means:

  • Locate the latest, something like:
drp_collections = butler.registry.queryCollections("LSSTComCam/runs/DRP/%")
latest_drp = max(drp_collections)

(Don't know if this actually works)

  • Use the latest
    Replace what is in collections in the obscore_config with this before sending it off to siav2_query? (or append to it?)

Where is the DRP collection path/string configured? Do I parse this from the obscore_config you will be publishing here, or is it something that needs to be part of the application configuration?

@dhirving
Copy link
Contributor Author

It would be something approximately like that, yeah. It would have to be a hack that's part of the application configuration.

But hopefully you don't have to do anything -- after no-Slack-day ends I'm going to ask Campaign Management to maintain a collection for this, which is easier for you and will give us a way to keep failed and in-progress pipeline runs out of SIA.

With the code in this PR as-is, you should have enough to start working on getting the service set up at USDF.

@dhirving
Copy link
Contributor Author

since you're looking at this PR right now:

I think the configuration will be pretty much identical to IDF, except the Butler repository is called embargo, and the Butler server domain names are the USDF RSP domains instead of the IDF ones.

@stvoutsin
Copy link
Contributor

since you're looking at this PR right now:

I think the configuration will be pretty much identical to IDF, except the Butler repository is called embargo, and the Butler server domain names are the USDF RSP domains instead of the IDF ones.

Ok sounds good, a preliminary branch with the phalanx/sia configuration for this is here:
lsst-sqre/phalanx@main...tickets/DM-48143
Although I'm not sure if the butler path is correct there, I might add you as a reviewer if thats ok once I PR

@dhirving dhirving requested a review from gpdf December 16, 2024 16:54
@dhirving
Copy link
Contributor Author

Campaign Management has agreed to manage a collection LSSTComCam/SIA pointing to the latest DRP. So that is what we'll serve from SIA initially. We'll need to revisit this once we start taking LSSTCam data.

@dhirving
Copy link
Contributor Author

I am going to merge this so that it goes into tonight's build of dax_obscore. Gregory, if any of the IVOA bits are not right let me know and I'll fix them in a later PR.

@dhirving dhirving merged commit feec90d into main Dec 18, 2024
15 checks passed
@dhirving dhirving deleted the tickets/DM-46990 branch December 18, 2024 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants